More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server

نویسندگان

Qirong Ho

James Cipar

Henggang Cui

Seunghak Lee

Jin Kyu Kim

Phillip B. Gibbons

Garth A. Gibson

Gregory R. Ganger

Eric P. Xing

چکیده

We propose a parameter server system for distributed ML, which follows a Stale Synchronous Parallel (SSP) model of computation that maximizes the time computational workers spend doing useful work on ML algorithms, while still providing correctness guarantees. The parameter server provides an easy-to-use shared interface for read/write access to an ML model's values (parameters and variables), and the SSP model allows distributed workers to read older, stale versions of these values from a local cache, instead of waiting to get them from a central storage. This significantly increases the proportion of time workers spend computing, as opposed to waiting. Furthermore, the SSP model ensures ML algorithm correctness by limiting the maximum age of the stale values. We provide a proof of correctness under SSP, as well as empirical results demonstrating that the SSP model achieves faster algorithm convergence on several different ML problems, compared to fully-synchronous and asynchronous schemes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System

With ever growing data volume and model size, an error-tolerant, communication efficient, yet versatile parallel algorithm has become a vital part for the success of many large-scale applications. In this work we propose msPG, an extension of the flexible proximal gradient algorithm to the model parallel and stale synchronous setting. The worker machines of msPG operate asynchronously as long a...

متن کامل

Exploiting Bounded Staleness to Speed Up Big Data Analytics

Many modern machine learning (ML) algorithms are iterative, converging on a final solution via many iterations over the input data. This paper explores approaches to exploiting these algorithms’ convergent nature to improve performance, by allowing parallel and distributed threads to use loose consistency models for shared algorithm state. Specifically, we focus on bounded staleness, in which e...

متن کامل

High-Performance Distributed ML at Scale through Parameter Server Consistency Models

As Machine Learning (ML) applications increase in data size and model complexity, practitioners turn to distributed clusters to satisfy the increased computational and memory demands. Unfortunately, effective use of clusters for ML requires considerable expertise in writing distributed code, while highly-abstracted frameworks like Hadoop have not, in practice, approached the performance seen in...

متن کامل

Analysis of High-Performance Distributed ML at Scale through Parameter Server Consistency Models

As Machine Learning (ML) applications embrace greater data size and model complexity, practitioners turn to distributed clusters to satisfy the increased computational and memory demands. Effective use of clusters for ML programs requires considerable expertise in writing distributed code, but existing highlyabstracted frameworks like Hadoop that pose low bar-ed frameworks like Hadoop that pose...

متن کامل

Probabilistic Synchronous Parallel

Most machine learning and deep neural network algorithms rely on certain iterative algorithms to optimise their utility/cost functions, e.g. Stochastic Gradient Descent (SGD). In distributed learning, the networked nodes have to work collaboratively to update the model parameters, and the way how they proceed is referred to as synchronous parallel design (or barrier control). Synchronous parall...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

Advances in neural information processing systems

دوره 2013 شماره

صفحات -

تاریخ انتشار 2013

More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server

نویسندگان

چکیده

منابع مشابه

On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System

Exploiting Bounded Staleness to Speed Up Big Data Analytics

High-Performance Distributed ML at Scale through Parameter Server Consistency Models

Analysis of High-Performance Distributed ML at Scale through Parameter Server Consistency Models

Probabilistic Synchronous Parallel

عنوان ژورنال:

اشتراک گذاری